Beyond support in two-stage variable selection

نویسندگان

  • Jean-Michel Bécu
  • Yves Grandvalet
  • Christophe Ambroise
  • Cyril Dalmasso
چکیده

Numerous variable selection methods rely on a two-stage procedure, where a sparsity-inducing penalty is used in the first stage to predict the support, which is then conveyed to the second stage for estimation or inference purposes. In this framework, the first stage screens variables to find a set of possibly relevant variables and the second stage operates on this set of candidate variables, to improve estimation accuracy or to assess the uncertainty associated to the selection of variables. We advocate that more information can be conveyed from the first stage to the second one: we use the magnitude of the coefficients estimated in the first stage to define an adaptive penalty that is applied at the second stage. We give the example of an inference procedure that highly benefits from the proposed transfer of information. The procedure is precisely analyzed in a simple setting, and our large-scale experiments empirically demonstrate that actual benefits can be expected in much more general situations, with sensitivity gains ranging from 50% to 100% compared to state-of-the-art.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Application of genetic algorithm (GA) to select input variables in support vector machine (SVM) for analyzing the occurrence of roach, Rutilus rutilus, in streams

Support vector machine (SVM) was used to analyze the occurrence of roach in Flemish stream basins (Belgium). Several habitat and physico?chemical variables were used as inputs for the model development. The biotic variable merely consisted of abundance data which was used for predicting presence/absence of roach. Genetic algorithm (GA) was combined with SVM in order to select the most important...

متن کامل

Highly Efficient In Vitro Production of Bovine Blastocyst in Cell-Free Sequential Synthetic Oviductal Fluid vs. TCM199 Vero Cell Co-Culture System

Background The aim of this study was to establish a cell-free sequential culture system that can support high levels of in vitro embryo development and blastocyst formation from bovine zygotes. To this end, this investigation was carried out to evaluate the effects of glucose, serum and EDTA on bovine zygote in vitro development. MaterialsAndMethods Bovine presumptive zygotes were derived from ...

متن کامل

Two-stage Production Systems under Variable Returns to Scale Technology: A DEA Approach

Data envelopment analysis (DEA) is a non-parametric approach for performance analysis of decision making units (DMUs) which uses a set of inputs to produce a set of outputs without the need to consider internal operations of each unit. In recent years, there have been various studies dealt with two-stage production systems, i.e. systems which consume some inputs in their first stage to produce ...

متن کامل

Application of tests of goodness of fit in determining the probability density function for spacing of steel sets in tunnel support system

One of the conventional methods for temporary support of tunnels is to use steel sets with shotcrete. The nature of a temporary support system demands a quick installation of its structures. As a result, the spacing between steel sets is not a fixed amount and it can be considered as a random variable. Hence, in the reliability analysis of these types of structures, the selection of an appropri...

متن کامل

Multi-Stage Variable Selection: Screen and Clean

This paper explores the following question: what kind of statistical guarantees can be given when doing variable variable in high dimensional models? In particular, we look at the error rates and power of some multi-stage regression methods. In the first stage we fit a set of candidate models. In the second stage we select one model by cross-validation. In the third stage we use hypothesis test...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Statistics and Computing

دوره 27  شماره 

صفحات  -

تاریخ انتشار 2017